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Abstract — We analyze Linear Programming (LP) decoding of 
graphical binary codes operating over soft-output, symmetric 
and log-concave channels. We show that the error-surface, 
separating domain of the correct decoding from domain of 
the erroneous decoding, is a polytope. We formulate the 
problem of finding the lowest-weight pseudo-codeword as a non- 
convex optimization (maximization of a convex function) over a 
polytope, with the cost function defined by the channel and the 
polytope defined by the structure of the code. This formulation 
suggests new provably convergent heuristics for finding the 
lowest weight pseudo-codewords improving in quality upon 
previously discussed. The algorithm performance is tested on 
the example of the Tanner [155,64,20] code over the Additive 
White Gaussian Noise (AWGN) channel. 

I. Introduction 

Low-Density Parity Check (LDPC) codes are capac- 
ity achieving (in the thermodynamic limit) and are easy 
to decode via message-passing algorithms of the Belief- 
Propagation (BP) type [1], [2], [3]. However, performance 
of the efficient decoder on a given finite code is not ideal, 
resulting in a sizable difference between optimal (Maximum 
A-Posteriori) and the suboptimal decoders observed in the 
asymptotics of Bit-Error-Rates (BERs) at the high Signal- 
to-Noise-Ratios (SNR), in the error floor regime [4]. Errors 
in this extreme regime of the error-floor are mainly due to 
special configurations of the channel noise, called instantons 
[5], correspondent to decoding into pseudo-codewords [6], 
[7] different from any of the codewords of the code. Analysis 
of the instantons and pseudo-codewords in the case of LP 
decoder [8] is of a special interest. LP is a combinatorial 
(zero-temperature) version of BP, thus admitting convenient 
description in terms of the pseudo-codeword polytope [8]. 
The geometric structure associated with the polytope gave 
rise to new decoding techniques related to graph covers [9], 
adaptive processing of the polytope constraints [10], and 
the concept of LP duality [11]. The succinct combinatorial 
formulation of the coding was also useful in terms of 
improving LP and thus reducing the gap between the LP 
and MAP decoders [12], [13], [14], [15], [16]. 

In [17] we suggested an LP-specific heuristic Pseudo- 
Codeword Search (PCS) algorithm. The main idea of the 
algorithm was based on exploring the Wiberg relation, from 
[6], [7], between pseudo-codeword and an optimal noise 



configuration which lies on the median between the pseudo- 
codeword and zero-codeword. In essence, the algorithm of 
[17] performs a biased walk over the exterior of the domain 
of correct LP decoding (surrounding zero codeword) and 
arrives at the error-surface (boundary of the domain) in 
a small finite number of steps. The algorithm, tested on 
some number of codes over the AWGN channel, showed 
excellent performance. For any noise initiation it always 
approaches the error-surface monotonically in simulations, 
even though the monotonicity proof was not provided. Latter 
the algorithm was generalized to the case of discrete-output 
channel (specifically Binary Symmetric (BS) channel) in 
[18], [19], where the monotonicity proof was given. The 
technique was also extended to discover the most probable 
configurations of error-vectors in compressed sensing [20] . 

This paper continues the trend of [17] and analyzes 
the error-surface and the associated low-weight pseudo- 
codewords. We study the domain of correct decoding, 
bounded by the error-surface; formulate the (channel spe- 
cific) problem of finding the most probable configuration 
of the noise leading to a failure (and respective pseudo- 
codeword) as an optimization problem; design an efficient 
heuristic; and illustrate performance of the algorithm on 
the exemplary Tanner [155,64,20] code [21]. The main 
statements of the manuscript are: 

• The domain of correct decoding is a polytope in the 
noise space. For a typical code the polytope is likely to 
be non-tractable, i.e., requiring description exponential 
in the code size. [Section Ullll 

• The problem of finding the lowest weight pseudo- 
codeword of a graphical code over log-concave sym- 
metric (for example AWGN) channel is reduced to 
maximization of a convex function, associated with the 
channel, over a polytope, associated with the code and 
defined as the cross-section of the decoding polytope by 
a plane. [Section HVll 

• We suggested Majorization Optimization Algorithm 
(MOA), based on majorization-minimization [22] ap- 
proximation of the aforementioned optimization formu- 
lation. We showed that MOA, as well as previously 
introduced PCS, are both monotonic in discovering 
iteratively the low-weight pseudo-codewords (effective 



weight decreases with iterations). [Section |V]] Perfor- 
mances of MOA and PCS are tested on the Tanner code 
over AWGN channel in Section |VT] 

II. Preliminary Discussions and Definitions 

We consider LP decoding [8] of binary LDPC code and 
discuss the problem of finding the most probable configura- 
tion of the noise, so-called instanton, for which the decoding 
fails [17]. Equivalently stated, this is the problem of finding 
the lowest weight (closest to the zero codeword) pseudo- 
codeword of the code. 

The technique we discuss here applies to any soft-output, 
symmetric channels where the transition probability, fP(a;|cr), 
from the codeword cr to the channel output x, is a log-convex 
function of x, i.e., — log(fP(a;|cr)) is a convex function of x). 
AWGN channel is our enabling example with 
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where s is the signal-to-noise ratio of the noise, cr = 
(O/ = 0, 1 1 / = 1, • • • ,A^), is the binary A^-bits long codeword 
launched into the channel, and x = (x, 6 R|/ = 1, • • • ,A^) is 
the real valued signal received by the decoder 

Maximum Likelihood decoding can be formulated as an 
LP optimization over the polytope, fP, spanned by all the 
codewords of the code C, 



min£(l-2x,)o; 



(2) 
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However, the full codeword polytope is exponentially large in 
the code size and thus it is not tractable. Trading optimality 
for efficiency the authors of [8] have suggested to relax the 
full polytope into a tractable one (stated in terms of a poly- 
nomial, in the size of the code, number of constraints). The 
relaxation, coined LP-decoding, is based on decomposition 
of the code into small individual checks based codes thus 
assuring (by construction) that the set of original codewords 
forms a subset of all the corners of the relaxed polytope (so- 
called set of pseudo-codewords). The LP-decoding can be 
formulated in multiple ways. Following [17], we choose to 
start here with the formulation of LP, correspondent to the 
so-called zero-temperature version of the Bethe Free Energy 
approach of [23]: 
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where b are beliefs, i.e., proxies for respective marginal prob- 
abilities. Ti is a polytope, which we call large (LP-decoding) 
polytope. !?/ only depends on the structure (graph) of the 
code (and it does not depend on the channel model). There 
are beliefs of two types associated with two types of nodes 



in the parity check graph of the code, Q, bits / and checks 
a respectively, o, = 0, 1 represent values of the bit /, and the 
vector cToi = (a,|/ ~ a;s.t. L,o, =0 mod 2) stands for one of 
the allowed local codewords associated with the check a. Of 
the conditions in the definition of the first two equalities 
are normalizations (for the beliefs/probabilities), the third 
equality states consistency between beliefs associated with 
bits and checks. The two last inequalities in !P/ ensure that 
the beliefs (probabilities) are positive. If the channel noise 
corrupting the zero codeword is sufficiently weak, i.e., if 
a; I <C 1, the LPp outputs zero, corresponding to successful 
decoding. However, LPp confuses another pseudo-codeword 
(typically non-integer) for the codeword if x is sufficiently 
noisy, then giving a strictly negative output, LPp < 0. 

Description of the LPp{x) in Eq. (|3]l can be restated in 
terms of a smaller set of beliefs, only bit beliefs ^ = (P, = 
bi{\)\i = \, - ■ ■ ,N). Then the "small polytope" formulation 
of Eq. © becomes [24], [8]: 



LPp(a;)==min£(l-2x,)p, 



(4) 
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where la is the subset of bit-nodes contributing check a. 

The "large polytope" formulation of the LP-decoding (|3]l 
can also be restated in terms of its dual (the formulation here 
is almost identical to DLPD2 of [11]) 



LPd{x) = maxJ^dfi + Y^Qa 



(5) 
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V/, Vo, : o,(l -2.r,) - (1 -2o,') I„^,-X,a > 
Va, Vo-a: L-^a^<a(l-2a,) >ea 



where = = I,-- - ,A^), 9 = (Qaja = I,-- - ,M), A = 
(A-,a I (' , a) e Qi) are Lagrangian multipliers (messages) con- 
jugated to the first, second and third conditions in the original 
LP (O respectively. According to the main (strong duality) 
theorem of the convex optimization (see many textbooks, 
e.g., [25]) the results of the primal problem (|3) and the dual 
problem (|5]l coincide, LPp = LPj. 

In this manuscript we are mainly concerned with the 
following practical problem: given a finite code, log-concave 
channel (for concreteness and without loss of generality we 
will consider AWGN channel as an example), and the LP- 
decoding (in its primal or dual versions), to find the most 
probable configuration (instanton) of the channel noise, x, 
imposed on the zero codeword, cro = 0, which leads to 
incorrect decoding. Formally, we are solving the following 
"instanton" problem 

min^x- , (6) 

where ©e^j is defined as an exterior (complement) of the 
domain, !Z)„„, correspondent to the correct decoding: LPp = 
LPa = 0. Thus, Tte^ = \ 



III. Domain of Correct Decoding is a Polytope 

Let us show that is actually a polytope. 
Consider the following auxiliary domain of {x;6,i^,X): 

= <^ V/, Vo, : a; ( 1 - 2x,) - ( 1 - 2o,) ha > 
[ Va, Vo-a : Li~aha{l-2ai) > da 

constructed from the feasibility region of the dual problem, 
LPii, with the zero cost function constraint added. For any 
X G there obviously exists an extended configuration 
{x,9,(t),X) from !fci- On the other hand, if a; e ©^r, then 
LPp = LPci < {i.e., a pseudo-codeword, different from the 
zero codeword, is selected by the LP), and since LPd is 
defined as a maximum over an extension of !fci (where 
the first condition in iTv/ is removed) there exists no valid 
{x,9,cl),\) from jF^; in this case. One concludes that T>i„,{x) 
coincides with the projection of on the x variable 

= Proj (if,/)^ = {3(0,^,X) s.t. {x-0,^,X) e Jd]- (7) 

However both iTv/ and its projection to x are polytopes, i.e., 
"Dint is also a convex domain, moreover it is a polytope Q. 

Note that the projected polytope is most likely non- 
tractable, in the sense that the number of constraints required 
to describe the polytope is expected to be exponential in the 
dimension of x (size of the code). 

IV. Search for Lowest Weight Pseudo-Codeword 
AS AN Optimization 

Noticing, that Eq. (|6]l is stated in terms of the exterior 
domain, T)ext, which is a compliment of !D„„, one attempts 
to formulate a closely related problem stated in terms of 
optimization over a convex sub-domain of ©/m: 



e(e) = minLP(a;) 



ceBalL 



(8) 



where Ballg = e : ||^||2 < e} is the ball of radius e 
(which is convex by construction). For sufficiently small e 
any LP{x) = for any x £ Ballg, while a gradual increase in 
e will eventually lead, at some e*, to appearance of the closest 
to the zero codeword (in terms of the h norm of the AWGN 
channel) noise configuration, xinst, for which LP{xi„st) < 
0. One concludes that the function of a single parameter, 
Q{e), jumps from zero at e < e* to some negative value at 
e = e*. Then, 4eJ becomes the effective distance of the code 
(under the LP-decoding), and the optimal value, a;* of Q{e.t), 
corresponds to the most probable instanton. 

Using primal formulation of LP-decoding from Eq. (|4) 
and combining minimization over x and P variables, one 
reformulates Eq. (|8]l as the following optimization problem 



e(e) 



min V(l 



(9) 
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'We are thankful to P. Vontobel for pointing out, after reading the first 
version of the manuscript, that the statement above is closely related to these 
made in [26]. See Fig. 11,12 of [26] as well as preceding and following 
discussions. 



One important advantage of this formulation is in the fact 
that Eq. (|9]l is stated as an optimization problem, in contrast 
with the sequential instanton search optimization of [17], 
where one optimizes over the noise, then evaluates an internal 
minimization (the LP decoding itself) for each configuration 
of the noise. Note that the cost function in Eq. (|9]l is quadratic 
and concave. 

Eq. (|9]l can be simplified further We expect that the 
extremal value will be achieved (at least for sufficiently large 
£) "at the surface" of the ball, i.e., at I^,jc? =e^. Replacing 
X 6 Ballg by this equality and performing optimization over 
X, we arrive (with the help of the standard Lagrangian 
multiplier technique, and also assuming that all components 
of the candidate noise vector are positive) at the following 
nonlinear optimization problem stated primarily in terms of 
the beliefs 



e(8) = min ^P,-28 /£P2 



(10) 



This problem can be solved approximately (but efficiently) 
via the majorization-minimization iterative method [22], con- 
sisting in upper-bounding the cost function by its linearized 
expression, minimizing the upper-bound, and iterating by 
shifting the linearization point to the solution received on 
the previous step. The linearization (for majorization) at each 
iterative step is justified because of the following obvious 
inequality 



P||2>L(P;p(^') = 0-p('=')/IIP 



#)| 



(11) 



which holds for any p. Then the iterative solution of Eq. (fTOl i 

becomes 



e(*+i)(e) = min ( ^p, -2eL(p; p 



Pg?, 



(12) 



where A: = 0, 1,--- till convergence, P'*' is the optimal 
solution of the optimization found at the k iteration step, 
and p('^+') becomes the optimal solution at the {k+l)-th 
iteration. The optimization problem on the rhs of Eq. (fT2t is 
an LP, i.e., it can be solved efficiently. We also expect that 
the iterations over k converge fast. The iterative procedure 
will depend on the initiation set at A: = 0, and starting from 
different initial conditions we sample different local optima. 

Note that e*, defined as the smallest e for which Q{e) 
becomes negative, allows useful interpretation in terms of 
the effective distance of the corresponding pseudo-codeword. 
Indeed, 4e^ = vv(P), where 



.(P) = 



(L-P,f 
LP? 



(13) 



is the weight of the noise (and of the corresponding pseudo- 
codeword) according to the Wiberg formula from [6], [7], 
expressing relation between the direction of the optimal noise 
and the distance along the direction from the zero codeword 
to the error-surface. 



Fig. 1. The Figure illustrates the sequential progress (from left to right) of the majorization-minimization procedure. The shaded area coiTesponds to 
2cone- Dashed lines on the left sub-figure show edges of % containing the origin 0. The optimization stalls from p'"'. Thin solid lines are the level curves 
of the linear function L(p; p'*'), which optimum over P results in p(*+'). The procedure continues till convergence, p(*+') = p(*) (achieved with k = 2'm 
the illustration). 



V. Cone Formulation and Majorization 
Optimization Algorithm 

To utilize Eq. ( fTOl i for finding the low-weight pseudo- 
codewords one needs to scan over the values of e, thus 
making one-parametric optimization (over e) in addition to 
the (multi-dimensional) optimization contained in Eq. ( fTOl i. 
The main result of this Subsection is that this additional 
degree of freedom in the optimization is unnecessary, thus 
leading to a simplification of Eq. ( fTOl i. 

Let us first show that: the vertex of %, correspondent to 
the pseudo-codeword with the lowest weight, is connected by 
an edge to the vertex correspondent to the zero-codeword. 

Since the weight-function, w(P) from Eq. ( fTsl l. does not 
depend on the length of the vector P, one considers P, as the 
direction in the respective space pointing from the origin, = 
(0, • • • ,0), to a point within the polytope It is convenient 
to parameterize the direction in terms of the projection to 
the ^, p, = 1 plane. Pseudo-codewords correspond to special 
values of the P vector projected to the plane, and to find the 
pseudo-codeword with the minimum weight we will need to 
minimize the weight, w(P) = 1/L, P?, over the cross-section 
of the polytope by the plane (projection). One restates the 
problem as maximization of p?, which is also equivalent 
to finding p maximizing the distance to the central point of 
the plane within the polytope %, 1/N = {I,--- ,l)/N. (P, 
is projected through the origin to the plane, thus forming a 
polytope too (call it cone polytope) 



^cone — 



V; 

VaV/ a 



P,>0 



(14) 



(The projection is understood in the standard projective space 
sense, with a line connecting a point within the polytope 
with the point of origin, (O,-- - ,0), projecting to the point 
where the line crosses the plane.) Note that only faces of 
in Eq. (HJi with |/| = 1 become faces of the cone polytope, 
2cone- Further, maximum of P? is attained at some vertex 
of the polytope. By construction this vertex corresponds to an 
edge connecting the point of origin, (O,--- ,0) with another 



vertex of the original polytope correspondent to a pseudo- 
codeword with the lowest weight. All the other vertexes of 
Ts, which are not connected to the origin, are projected to 
interior points of the cone polytope S'cone, thus showing a 
higher value of the weight. 

The choice of the cone cross-section in Eq. (fT4l i is 
convenient for the purpose of simplifying the optimization 
problem ( fTOt . It guarantees that the first term in the objective 
of Eq. ([Tol l is constant, and thus the term is inessential for 
the purpose of optimization. In the result, we arrive at the 
following reduced version of Eq. (fTol i (one less degree of 
freedom and simpler polytope) 



Q 



LP^ 



(15) 



According to the discussion above, solution of Eq. (fTSl l 
only describes the optimal direction in the noise space, x, 
and the respective length is reconstructed from the weight 
relation 4eJ = i^(P)- Thus our final expression for the optimal 
noise (instanton), correspondent to the (optimal) solution of 
Eq. dB) is 

I,P, 



cc = p; 



(16) 



2Lpr 

The geometrical essence of the cone construction and of the 
majorization-minimization procedure is illustrated in Fig. [T] 

Few remarks are in order First, note that there is some 
additional freedom in choosing the objective function in the 

optimization over p. For example, one can replace, \JYj^^ 

under the sum in Eq. dB) by L(P<- - 1/ Pj/A^)^> and the 
resulting optimal P stays the same. Second, the majorization- 
minimization procedure of Eq. (fTZt for Eq. (ITfft . extends 
straightforwardly to any appropriate choice of the objective 
function in the reduced optimization, in particular the choice 
of Eq. (flSl l, thus resulting in the sequence 



P 



{k+\) 



- argmax p ■ p^*'* 
P 



(17) 



Third, the sequence ( fTTT i is monotonic by construction, i.e., 
the effective distance can only decrease with the iteration 
number k, thus proving convergence. 



The considerations above suggest the following 
Majorization-Optiinization Algorithm (MOA): 

• Start: Initiate a point P'"' inside the cone cross-section 
Scone with a random deviation from the (1,1,...,1)/A^. 
[The sampling step.] 

• Step 1: Construct a linear function with the gradient 
vector pointing from (1, 1, 1)/A^ to P**^', optimize it 
inside i'cone according to Eq. ( [TtI i. and get the new 
p(<;+i) [Xhe majorization-minimization step.] 

. Step 2: If p(^+i) ^pW, then go to Step 1. 

• End: Output the optimal noise configuration according 
to Eq. (dSll. 

Like PCS of [17], MOA is sensitive to the choice of the 
initial direction in the P space, and this clarifies importance 
of repeating sampling step multiple times. Obviously, an 
individual sampling event outputs only pseudo-codewords 
sharing an edge in % with the zero-codeword, call them 
"nearest-neighbors", thus ignoring other pseudo-codewords, 
for example these which are "next-nearest-neighbors" to the 
zero codeword, i.e., ones sharing an edge with a pseudo- 
codeword which shares an edge with the zero-codeword. 
Even though the effective distance of these "next-nearest- 
neighbors" may be smaller than the effective distance of 
some of the "nearest-neighbors", MOA guarantees that the 
exact solution of Eq. (flSl l can only be a "nearest-neighbor". 

In the remainder of the Section let us briefly compare 
MOA with PCS. The iterative procedure of PCS is analogous 
to Eq. (fTTI i and it can be restated as 

p(^+i)=argmaxp.fp<'^^4)--ll ' ^l^) 
P V L(P| V / pes', 

— h=2x — \ 

Note, that, aw(P)/ap = (21,- P?) ' so clearly, PCS 
aims to approximate w(P), linearly inside the polytope, %. 
The function w(P) is a homogeneous function of degree 0. 
MOA takes advantage of this fact and attempts to minimize 
w(P) in the projective space of P, indexed by the points 
of fPcone- The value of max in dTSI ) is non-negative, and it is 
exactly zero at P = P'*^'. If w(P) < A^, vector 3w(P) /3P points 
away from the central direction 1, and thus minimization ( fTSl l 
is not going to increase w(P), i.e., under this (weak and easy 
to realize) condition the PCS is provably monotonic. Also, 
as P • (3w(P)/3P) = 0, PCS, Uke MOA, always converges 
to vertices of % which are the "nearest-neighbors" of the 
zero-codeword (the cone origin). 

Since PCS works with 3w(P)/3P, and not directly with 
w(P) like MOA, it "confuses" w(P) for being a homogeneous 
function of degree 1 . Therefore, compared to MOA, PCS has 
an additional bias away from the cone origin, thus suggest- 
ing that its convergence is slower and resulting end-points 
being further away from the cone origin. This assessment is 
confirmed in the simulations of the next Section (see, e.g.. 
Fig. 0. 
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Fig. 2. The probability/frequency of occun'ence, p(»'), of the pseudo- 
codewords with effective weight w or smaller for the Tanner [155,64,20] 
code [21]. Solid and dashed lines represent results of 10* trials of MOA 
and PCS algorithm of [17] respectively. 

VI. Tanner Code Test 

We tested MOA on the popular example of the Tanner 
[155,64,20] code [21]. The results are shown in Fig. |2] 
We analyzed effective distance, w, of the pseudo-codewords 
found in the result of 10'* trials (different in initial orien- 
tation). As in the case of the PCS of [17], the probability 
(frequency), p(w), of finding pseudo-codeword with effective 
distance smaller than w, grows monotonically with w. Like 
PCS, MOA result for the smallest effective distance of 
the code is, Wmin ~ 16.4037 < 20, where 20 is the Ham- 
ming distance of the code. However, we also observe that 
MOA is sampling the low-weight "nearest-neighbor" pseudo- 
codewords more efficiently than PCS, which is seen in a 
steeper dependence of p(w) as a function of w in Fig. |2l As 
discussed above, we attribute the better performance of MOA 
to stronger bias towards the zero codewords convergence, as 
well as simpler and more homogeneous (in the low weight 
sector of the pseudo-codewords) initiation procedure. 

VII. Conclusions and Path Forward 

This paper reports new results related to analysis and al- 
gorithms discovering the lowest-weight pseudo-code word(s) 
of the LP decoding of graphical codes performing over soft- 
output (log-concave) channels, like the AWGN channel. On 
the theoretical side, we show here that the set of correct 
decoding is a polytope in the space of noise. We also 
formulate the problem of finding the smallest weight noise 
(instanton) as an optimization problem, Eq. (flSl l. looking 
for a maximum of a convex function over a convex set 
(a polytope). The exact solution of the problem is likely 
non-tractable, and we suggest heuristic iterative algorithmic 
solution based on the majorization-minimization approach of 
the optimization theory [22]. We show that convergence of 
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